benchmark and algorithm
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Modern vision models are trained on very large noisy datasets. While these models acquire strong capabilities, they may not follow the user's intent to output the desired results in certain aspects, e.g., visual aesthetic, preferred style, and responsibility. In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system. Advanced retrieval systems usually adopt a cascade of aesthetic models as re-rankers or filters, which are limited to low-level features like saturation and perform poorly when stylistic, cultural or knowledge contexts are involved. We find that utilizing the reasoning ability of large language models (LLMs) to rephrase the search query and extend the aesthetic expectations can make up for this shortcoming.
Benchmarks and Algorithms for Offline Preference-Based Reward Learning
Shin, Daniel, Dragan, Anca D., Brown, Daniel S.
Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization via offline RL, our observation is that it can be a surprisingly rich source of information for preference learning as well. We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning, learns a distribution over reward functions, and optimizes a corresponding policy via offline RL. Crucially, our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps. To test our approach, we first evaluate existing offline RL benchmarks for their suitability for offline reward learning. Surprisingly, for many offline RL domains, we find that simply using a trivial reward function results good policy performance, making these domains ill-suited for evaluating learned rewards. To address this, we identify a subset of existing offline RL benchmarks that are well suited for offline reward learning and also propose new offline apprenticeship learning benchmarks which allow for more open-ended behaviors. When evaluated on this curated set of domains, our empirical results suggest that combining offline RL with learned human preferences can enable an agent to learn to perform novel tasks that were not explicitly shown in the offline data.
[D] Benchmarks and algorithms for small-data regime? • r/MachineLearning
I'm working on a transfer learning thing that is geared towards optimizing the space of models to look at given that you know something about e.g. the amount of training data you will receive, examples of the kind of covariate shift or nonstationarity effects you expect to exist, etc. It's a kind of learned regularization trick, and the best results seem to be when you know you'll only ever have from 10-100 points of training data (you can also run it in unsupervised or semi-supervised modes, asking it to do the best it can with 10 labeled points and 100 points with no labels for example). The aim is to submit to NIPS 2018. Currently when I do performance comparisons I've mostly used the basic Kaggle standbys - linear SVC, RBF-kernel SVC, kNN, RandomForest, and XGBoost. What kinds of algorithms would you (as a reviewer) expect to see comparisons with in this problem space?